Dates and intervals#8
Conversation
56dbb6e to
27723f9
Compare
benesch
left a comment
There was a problem hiding this comment.
I haven't looked through this in great detail, but it all seems quite sensible! Feel free to merge whenever you think it's ready.
How durationlike INTERVALs are actually handledI'm going to walk through the 2016 spec, my understanding of it, and what other DBs are doing. An interval is defined as An where The spec on page 41 states:
and on page 746 it says:
And 747 says:
There might be other semantic definitions of intervals in SQL '16, and '92 is just worded slightly differently, but I can't find them.. I misinterpreted the fact that there is a Additionally, the words in the spec inline above seem to say that if you use the full Instead, postgres just uses the smallest precision as the smallest value, and shifts things around to make that work: -- a second, nice
SELECT INTERVAL '1' SECOND;
00:00:01
-- huh, this is 1 hour 2 minutes, I would expect it to be 1 minute 2 seconds
SELECT INTERVAL '1:2' SECOND;
01:02:00
-- okay if there is a decimal it must be a second
SELECT INTERVAL '2:3.0004' MINUTE;
00:02:00
SELECT INTERVAL '2:3.0004' SECOND;
00:02:03.0004
-- this is more expected
SELECT INTERVAL '1:2:3' SECOND;
01:02:03
-- this is also reasonable with the new semantics
SELECT INTERVAL '1:2:3' MINUTE;
01:02:00
-- I would expect this to be 00:02:03
SELECT INTERVAL '1:2:3' MINUTE TO SECOND;
01:02:03
-- also pretty reasonable
SELECT INTERVAL '9 1:2' SECOND;
9 days 01:02:00MySQL on the other hand just has really bad handling of things:
Almost none of that is the way that it works in this PR. Monthlike intervalsOkay I think that's all the things about duration-like datetimes, let's look at months, which the spec has this to say about them:
Seems clear enough, let's look at an obvious failure: SELECT INTERVAL '8-9 1:2:3.0004' SECOND;
8 years 9 mons 01:02:03.0004oh come on. SELECT INTERVAL '7-8-9 1:2:3.0004' SECOND;
ERROR: invalid input syntax for type interval: "7-8-9 1:2:3.0004"
LINE 1: SELECT INTERVAL '7-8-9 1:2:3.0004' SECOND;
SELECT INTERVAL '7-8 9 1:2:3.0004' SECOND;
7 years 8 mons 9 days 01:02:03.0004Postgres supports combinations of durationlike and timelike intervals 🤷♀️ MySQL returns null, similar to what it does for things that should really be legal. I'm going to spend the weekend thinking about what the right thing to do here is. PlanMy current inclination is to make the parsing and I do not plan to support combining year-month and durationlike intervals. The postgres algorithm appears to be:
|
|
Exciting followup, MySQL edition. MySQL is a lot closer to my interpretation of the spec: SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' SECOND);
+---------------------+
| 2000-01-01 00:00:01 |
+---------------------+
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' HOUR_SECOND);
+---------------------+
| 2000-01-01 01:02:03 |
+---------------------+
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '9 1:2:3' DAY_SECOND) as result;
+---------------------+
| 2000-01-10 01:02:03 |
+---------------------+
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:2:3' MINUTE_SECOND);
+--------+
| NULL |
+--------+
SHOW WARNINGS;
+---------+------+-----------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------+
| Warning | 1441 | Datetime function: date_add_interval field overflow |
+---------+------+-----------------------------------------------------+
--
-- MySQL cannot handle fractional seconds unless they are the only thing in the interval!
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '1:3.004' MINUTE_SECOND);
---------+
-- NULL |
---------+
SELECT DATE_ADD('2000-01-01 00:00:00', INTERVAL '3.004' SECOND);
-------------------------------+
-- 2000-01-01 00:00:03.004000 |
-------------------------------+And when you specify an interval with no field specifier in postgres you get SELECT INTERVAL '1:2:3';
|------------|
| 1:02:03 |
+------------+
SELECT INTERVAL '2:3';
|------------|
| 2:03:00 |
+------------+
SELECT INTERVAL '1';
|------------|
| 0:00:01 |
+------------+For most uses the most important takeaway is that:
I think my plan has changed to maintaining the current parsing rules that I've implemented, and also to expose a convenience method that will allow SQL engines to get a "were more fields present in the interval string than the field specifier?" method to provide nice errors. |
27723f9 to
4d0f5c7
Compare
This allows parser-users to be more restrictive in the values that they accept if they want to use the `IntervalValue::computed` method, because afaict no databases in existence actually match any SQL spec for their interpretation of interval literal values[1]. [1]: #8 (comment)
4d0f5c7 to
c698a8e
Compare
| expected_duration_str: Option<&str>, | ||
| expected_field_match_erro: Option<&str>, | ||
| ) { | ||
| println!("testing: {}", sql); |
There was a problem hiding this comment.
This makes it easier to tell what sql statement caused test failures, since many of the tests test many statements. It is silent if the tests pass, I added a comment explaining that it's intentional.
| value: IntervalValue, | ||
| expected_computed: Interval, | ||
| expected_duration_str: Option<&str>, | ||
| expected_field_match_erro: Option<&str>, |
There was a problem hiding this comment.
is "erro" a standard abbreviation "err option?" or a typo?
This allows parser-users to be more restrictive in the values that they accept if they want to use the `IntervalValue::computed` method, because afaict no databases in existence actually match any SQL spec for their interpretation of interval literal values[1]. [1]: #8 (comment)
c698a8e to
cba2588
Compare
This allows parser-users to be more restrictive in the values that they accept if they want to use the `IntervalValue::computed` method, because afaict no databases in existence actually match any SQL spec for their interpretation of interval literal values[1]. [1]: #8 (comment)
cba2588 to
5d91c38
Compare
ParsedDateTimes just represent all the values that were present in a datetime string. For a regular `DateTime` object they will all be present, but for `INTERVAL` they depend on the following token.
This will allow us to put some methods on it.
Knowing that we've got some optional number of years/seconds/minutes in a `ParsedDateTime` isn't nearly as useful as being able to state that you have 13 months or 250,000 milliseconds. Now we can get either/or of those.
This allows parser-users to be more restrictive in the values that they accept if they want to use the `IntervalValue::computed` method, because afaict no databases in existence actually match any SQL spec for their interpretation of interval literal values[1]. [1]: #8 (comment)
This uses some of the infrastructure added for Intervals to support extracting the fields of `DATE 'yyyy-mm-dd'`. It has a similar philosophical position -- it just extracts the fields into a struct that has numeric fields for the dates, and does very little verification -- the only thing that it does is verify that month and day are not 0 and fit into a u8, since the exact number of days in any given month are variable.
43389b5 to
6b489dd
Compare
This isn't ready yet but I think all the types are in place.
There are a couple breaking changes here, which I could fix by adding a
ParsedDateTime::from_value(str, leading_field)that computes lazily, but that feels like it breaks the overall style of sqlparser of having everything fully parsed as is.Still obviously left to do:
Date,Time, andDateTimeparsing, which should just be a matter of using the parsing code in this commit.